Systems and Methods for Producing RNA Constructs with Increased Translation and Stability

ABSTRACT

Systems and methods for enhancing RNA translatability and stability are disclosed. Some embodiments describe RNA molecules exhibiting increased translatability and/or stability. Additional embodiments describe methods for screening RNA molecules for increased translatability and/or stability. Various embodiments utilize screening methods, including degenerative sequences to identify sequences or regions that increase the translatability and/or stability of RNA molecules.

CROSS-REFERENCE TO RELATED APPLICATIONS

The current application is a divisional of U.S. patent application Ser.No. 17/463,466, filed Aug. 31, 2021, which claims priority to U.S.Provisional Patent Application No. 63/072,669, filed Aug. 31, 2020; thedisclosures of which is hereby incorporated by reference in itsentirety.

FIELD OF THE DISCLOSURE

The present invention relates to ribonucleic acid (RNA). Morespecifically, the present invention relates to systems and methods toenhance RNA translatability and assessment thereof.

INCORPORATION OF SEQUENCE LISTING

This application hereby incorporates by reference the material of theelectronic Sequence Listing filed concurrently herewith. The material inthe electronic Sequence Listing is submitted as an XML file entitled“06753 DIV_Seq_List.xml” created on Nov. 7, 2022, which has a file sizeof approximately 412 KB, and is herein incorporated by reference in itsentirety.

BACKGROUND OF THE DISCLOSURE

There are multiple problems with prior methodologies of effectingprotein expression. For example, introduced DNA can integrate into hostcell genomic DNA at some frequency, resulting in alterations and/ordamage to the host cell genomic DNA. Alternatively, the heterologousdeoxyribonucleic acid (DNA) introduced into a cell can be inherited bydaughter cells (whether or not the heterologous DNA has integrated intothe chromosome) or by offspring.

In addition, assuming proper delivery and no damage or integration intothe host genome, there are multiple steps which must occur before theencoded protein is made. Once inside the cell, DNA must be transportedinto the nucleus where it is transcribed into RNA. The RNA transcribedfrom DNA must then enter the cytoplasm where it is translated intoprotein. Not only do the multiple processing steps from administered DNAto protein create lag times before the generation of the functionalprotein, each step represents an opportunity for error and damage to thecell. Further, it is known to be difficult to obtain DNA expression incells as DNA frequently enters a cell but is not expressed or notexpressed at reasonable rates or concentrations. This can be aparticular problem when DNA is introduced into primary cells or modifiedcell lines.

SUMMARY OF THE DISCLOSURE

This summary is meant to provide examples and is not intended to belimiting of the scope of the invention in any way. For example, anyfeature included in an example of this summary is not required by theclaims, unless the claims explicitly recite the feature. Also, thefeatures described can be combined in a variety of ways. Variousfeatures and steps as described elsewhere in this disclosure can beincluded in the examples summarized here.

In one embodiment, a method to determine RNA translatability includesobtaining a pool of RNA molecules, where each RNA molecule is uniquelyencoded with a barcoding sequence and each barcoding sequence is flankedby at least one profiling sequence, transfecting a cell or cell lysatewith the pool of RNA molecules, performing polysome profiling on thepool of RNA molecules to segregate RNA molecules based on the number ofribosomes bound to the RNA molecule, and isolating a first fraction fromthe polysome profile to generate a first set of RNA molecules showing afirst level of ribosomes bound to the RNA molecules in the set of RNAmolecules.

In a further embodiment, the method further includes sequencing thebarcode sequence of each RNA molecule in the first set of RNA moleculesto identify the presence of each RNA molecule in the first set of RNAmolecules.

In another embodiment, the method further includes determiningtranslatability of the RNA molecules associated with each barcodesequence in the fraction by identifying the prevalence of each barcodein the fraction.

In a still further embodiment, the RNA molecules are transfected into acollection of cells.

In still another embodiment, the collection of cells is selected frommammalian cells, yeast cells, bacteria cells, and plant cells.

In a yet further embodiment, the RNA molecules are added to a celllysate.

In yet another embodiment, polysome profiling comprises adding a celllysate to a sucrose gradient and centrifuging the sucrose gradient tosegregate the RNA molecules.

In a further embodiment again, the barcoding sequence is selected fromSEQ ID NOs: 115-1380.

In another embodiment again, the profiling sequence is selected from SEQID NOs: 1381-1382.

In a further additional embodiment, the method further includesisolating a second fraction from the polysome profile to generate asecond set of RNA molecules showing a second level of ribosomes bound tothe RNA molecules in the set of RNA molecules, where the first level andsecond level represent different amounts of bound ribosomes.

In another additional embodiment, the method further includes sequencingthe barcode sequence of each RNA molecule in the first set of RNAmolecules and the second set of RNA molecules to identify the presenceof each RNA molecule in the first set of RNA molecules and the secondset of RNA molecules.

In a still yet further embodiment, isolating a first fraction from thepolysome profile includes isolating a plurality of fractions of thepolysome profile, where each fraction in the plurality of fractionsgenerates a set of RNA molecules showing a different level of ribosomesbound to the RNA molecules in that set of RNA molecules.

In still yet another embodiment, the method further includes sequencingthe barcode sequence of each RNA molecule in each set of RNA moleculesto identify the presence of each RNA molecule in each set of RNAmolecules.

In a still further embodiment again, the method further includesgenerating a distribution for each RNA molecule based on the prevalenceof each RNA molecule in each fraction.

In still another embodiment again, isolating a first fraction furthercomprises introducing a known amount of spike-in RNA molecule, whereinthe spike-in RNA molecule serves as an internal reference to allow forquantification of the first set of RNA molecules.

In a still further additional embodiment, an RNA molecule for increasedtranslation includes a 5′ untranslated region, a 3′ untranslated region,and a coding sequence, where the 5′ untranslated region is located 5′ ofthe coding sequence and the 3′ untranslated region is located 3′ of thecoding sequence.

In still another additional embodiment, wherein the coding sequencecodes for a peptide of interest.

In a yet further embodiment again, the 5′ untranslated region isselected from SEQ ID NOs: 1-55 and SEQ ID NOs: 81-111.

In yet another embodiment again, the 3′ untranslated region is selectedfrom SEQ ID NOs: 56-80.

In a yet further additional embodiment, the RNA molecule furtherincludes a barcode sequence located 3′ of the coding sequence and atleast one profiling sequence adjacent to the barcode sequence.

In yet another additional embodiment, the barcode sequence is selectedfrom SEQ ID NOs: 115-1380 and the profiling sequence is selected fromSEQ ID NOs: 1381-1382.

In a further additional embodiment again, a method to determine RNAstability includes obtaining a pool of RNA molecules, where each RNAmolecule is uniquely encoded with a barcoding sequence and eachbarcoding sequence is flanked by at least one profiling sequence,treating the pool of RNA molecules under an experimental condition, andisolating the pool of RNA molecules at a specified timepoint to generatea fraction of RNA molecules showing stability under the experimentalcondition for the specified timepoint.

In another additional embodiment again, the method further includessequencing the barcode sequence of each RNA molecule in the fraction toidentify the presence of each RNA molecule in the fraction of RNAmolecules.

In a still yet further embodiment again, the method further includesdetermining stability of the RNA molecules associated with each barcodesequence in the fraction by identifying the prevalence of each barcodein the fraction.

In still yet another embodiment again, the treating step includestransfecting the pool of RNA molecules into a collection of cells.

In a still yet further additional embodiment, the collection of cells isselected from mammalian cells, yeast cells, bacteria cells, and plantcells.

In still yet another additional embodiment, the treating step includesadding the pool of RNA molecules to a cell lysate.

In a yet further additional embodiment again, the treatment condition isselected from temperature, pH, presence of certain molecules, presenceof certain ions, concentration of certain molecules, concentration ofcertain ions, irradiation, buffer type, and buffer concentration.

In yet another additional embodiment again, the method further includessize selecting for full-length RNA molecules.

In a still yet further additional embodiment again, size selectingincludes performing reverse transcription PCR to transcribe a regionfrom each into cDNA, wherein the region is selected from a full-lengthmRNA, a full-length CDS, a 5′UTR-CDS, a 3′UTR-CDS, and the barcode.

In still yet another additional embodiment again, the isolating stepfurther includes isolating the pool of RNA molecules at a secondspecified timepoint to generate a second fraction of RNA moleculesshowing stability under the experimental condition for the specifiedtimepoint.

In another further embodiment, isolating the pool of RNA moleculesfurther includes introducing a known amount of spike-in RNA molecule,where the spike-in RNA molecule serves as an internal reference to allowfor quantification of the fraction of RNA molecules.

In still another further embodiment, a method for identifying RNAmolecules possessing increased translatability and stability includesobtaining a pool of RNA molecules, where each RNA molecule is uniquelyencoded with a barcoding sequence and each barcoding sequence is flankedby at least one profiling sequence, assessing translatability of thepool of RNA molecules by transfecting a cell or cell lysate with a firstsubset of the pool of RNA molecules, performing polysome profiling onthe first subset of the pool of RNA molecules to segregate RNA moleculesbased on the number of ribosomes bound to the RNA molecule, andisolating a fraction from the polysome profile to generate a first setof RNA molecules showing a first level of ribosomes bound to the RNAmolecules in the set of RNA molecules, and assessing stability of thepool of RNA molecules by treating a second subset of the pool of RNAmolecules under an experimental condition, and isolating a fraction fromthe second subset the pool of RNA molecules at a specified timepoint togenerate a second set of RNA molecules showing stability under theexperimental condition for the specified timepoint.

In yet another further embodiment, the method further includessequencing the barcode sequence of the first set of RNA molecules andthe second set of RNA molecules to identify the presence of each RNAmolecule in each fraction of RNA molecules.

In another further embodiment again, the method further includesdetermining translatability and stability of the RNA moleculesassociated with each barcode sequence in the first set of RNA moleculesand the second set of RNA molecules by identifying the prevalence ofeach barcode in each fraction of RNA molecules.

In another further additional embodiment, the barcoding sequence isselected from SEQ ID NOs: 115-1380.

In yet another further additional embodiment, the profiling sequence isselected from SEQ ID NOs: 1381-1382.

In yet again another further additional embodiment, a method to selectfor RNA elements includes obtaining a library of RNA molecules, whereeach RNA molecule comprises a coding sequence, a 5′ untranslated region(5′UTR), and a 3′ untranslated region (3′UTR), where one of the codingsequence, the 5′UTR, or the 3′UTR comprises a degenerate region,assessing a property of the library of RNA molecules, where the propertyis selected from translatability, in vivo stability, and in vitrostability, and selecting an RNA molecule from the library of RNAmolecules showing increase in the property over other RNA molecules inthe library of RNA molecules.

In yet another further additional embodiment again, the method furtherincludes sequencing the selected RNA molecule.

In a yet further additional embodiment, the selected RNA molecule is apool of RNA molecules.

In yet again another further embodiment, the method further includesreassessing the property of the pool of RNA molecules, and selecting anRNA molecule from the pool of RNA molecules showing increase in theproperty over other RNA molecules in the pool of RNA molecules.

In again another yet further additional embodiment, the method furtherincludes sequencing the selected RNA molecule from the pool of RNAmolecules.

In yet again another yet further additional embodiment, the property istranslatability.

In yet another yet further additional embodiment again, the degenerateregion is selected from a deletion, a random sequence, an ambiguoussequence, and a truncation.

The foregoing and other objects, features, and advantages of thedisclosed technology will become more apparent from the followingdetailed description, which proceeds with reference to the accompanyingfigures.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawing(s) will be provided by the Office upon request and paymentof the necessary fee.

The description and claims will be more fully understood with referenceto the following figures and data graphs, which are presented asexemplary embodiments of the invention and should not be construed as acomplete recitation of the scope of the invention.

FIG. 1A illustrates a generalized structure of RNA molecules inaccordance with various embodiments of the invention.

FIG. 1B illustrates a method for performing iterative selection of RNAelements to enhance translatability and/or stability in accordance withvarious embodiments of the invention.

FIG. 2 illustrates a method to screen RNAs for increased translatabilityin accordance with various embodiments of the invention.

FIG. 3 illustrates a method to screen RNAs for increased in vivostability in accordance with various embodiments of the invention.

FIG. 4 illustrates a method to screen RNAs for increased in vitrostability in accordance with various embodiments of the invention.

FIG. 5A illustrates a method to screen a pool of RNAs for stabilityand/or translatability in accordance with various embodiments of theinvention.

FIG. 5B illustrates a method to identify RNAs possessing increasedtranslatability and/or stability in accordance with various embodimentsof the invention.

FIGS. 6A-6C illustrate exemplary data of a heatmap showing RNA presencein various fractions after polysome profiling. FIG. 6A illustrates afull view of the heatmap; while FIGS. 6B-6C illustrate enlarged views ofthe heatmap of FIG. 6A.

FIGS. 7A-7C illustrate exemplary data in the form of box and whiskerplots showing ribosome load (FIG. 7A), in cell half-life (FIG. 7B), andin solution half-life (FIG. 7C).

FIGS. 8A-8C illustrate exemplary data of correlations between in cellhalf-life and ribosome load (FIG. 8A), monosome-to-free-subunit ratio(FIG. 8B), and polysome-to-monosome ratio (FIG. 8C).

FIGS. 9A-9B illustrate exemplary data of in cell half-life (FIG. 9A) andribosome load (FIG. 9B).

FIG. 9C illustrates an exemplary demonstration of how to determine orpredict protein expression based on in cell half-life and ribosome load.

DETAILED DESCRIPTION OF THE DISCLOSURE

Turning now to the drawings, systems and methods to enhance RNAtranslatability and uses thereof, and systems and methods to quantifyRNA stability and translatability and uses thereof are provided. Manyembodiments provide nucleic acid molecules (e.g., RNA molecules(including messenger RNA (mRNA)), DNA molecules, DNA/RNA hybridmolecules) that allow for an assessment of in vitro, in vivo, in cell,in solution, in storage, and/or any other form of molecular stability.Some embodiments are directed to RNA molecules, including mRNA, withincreased translatability and/or stability. Certain embodiments provideRNA molecules used for RNA therapeutics, including vaccines, where oneor more of 1) high and sustained expression of RNA (e.g., mRNA), 2) highstability of RNA inside of cells (e.g., in vivo), and 3) high stabilityof RNA in solution (e.g., in vitro) is desired.

Further embodiments provide methods that provide RNA molecules withincreased translatability and/or stability, while additional embodimentsprovide methods to test translatability and/or stability of RNAmolecules. Certain embodiments provide a multiplexed workflow togenerate RNA molecules, including mRNA, having increased translatabilityand/or stability in a single assay. In many embodiments, the RNAmolecules of various embodiments are generated via rational design,while certain embodiments generate RNA molecules via iterativeselection.

RNA Molecules and Design

As noted above, some embodiments generate RNA molecules via rationaldesign, while others utilize iterative selection. Rational design is amethodology that combines sequence components, such as a 5′ UTR, a 3′UTR, and/or a coding region that exist in nature or are syntheticallyengineered for specific objective (e.g., increased stability ortranslatability). However, certain embodiments utilize iterativeselection to generate RNA molecules, where various sequence components,such as a 5′ UTR, a 3′ UTR, and/or a coding region, comprise randomsequences. Certain embodiments utilizing iterative selection optimizeRNA molecules for translatability and/or stability over several roundsof sequence selection (e.g., selecting for sequences showing increasedtranslation or stability).

Turning to FIG. 1A, an exemplary structure for an embodiment of an RNAmolecule in accordance with various embodiments is illustrated. Certainembodiments of an RNA molecule possess a 5′ cap moiety. Some embodimentsutilize a 7-methyl guanosine triphosphate as the cap moiety, but variousadditional cap sequences are known in the art for a 5′ cap moiety.Additional embodiments possess a cap-proximal sequence for an mRNAregion located at the 5′ end of the mRNA at the 3′end of the 5′ capmoiety. Various cap sequences are known in the art for a 5′ cap-proximalsequence. Certain embodiments use a small triplet, such GGG as thecap-proximal sequence.

Additional embodiments of an RNA molecule possess a 5′ untranslatedregion (5′UTR) sequence and/or a 3′UTR sequence. Certain embodimentsplace the 5′UTR near the 5′ end of the RNA molecule, while the 3′UTR islocated near the 3′ end of the molecule. In some embodiments, the 5′UTRis located at the 3′ end of a 5′ cap moiety, while additionalembodiments place the 5′UTR directly at the 5′ end without a 5′ capmoiety or cap sequence. Similarly, a 3′UTR can be placed at the 3′ endof a molecule, while additional embodiments may have a tailing sequenceplaced 3′ of the 3′UTR. Certain embodiments select a 5′UTR and/or a3′UTR for a variety of factors to increase RNA translatability,stability, and/or other property based on an innate sequence, whileothers select a 5′UTR and/or a 3′UTR for that may pose improvedtranslatability, stability, and/or other property based on a particularcoding sequence of interest. Many possible 5′UTRs and 3′UTRs are knownin the art, which are used in various embodiments. Some specificembodiments of rationally designed RNAs select the 5′UTR from natural ormodified 5′UTR elements, including SEQ ID NOs: 1-55. And, certainspecific embodiments select the 3′UTR from SEQ ID NOs: 56-80. Tables 1and 2 list various 5′UTRs and 3′UTRs, respectively, with theirrespective SEQ ID NOs.

TABLE 1 5′UTR Sequences Name: SEQ ID NO: SynJ 1 hHBB30 2 CYP2E1 3 CYBA 4mRpl18a 5 RpS25 6 scrUTR 7 TEV 8 hHBB 9 APOA2 10 TOP_hHBB 11 C3 12hHBB_pA 13 TCV 14 PoV_pA_scrUTR 15 TMV 16 PoV_pA_hHBB 17 RpL38 18CoV-2-TTG-dSL4-5 19 hACTB 20 RpL31 21 mRpl18a_hHBB 22 DEN2 23 RBCS3B 24mActb 25 mActb_inv 26 TEV_CERT_hHBB 27 Tubb2b 28 P4_hACTB 29 hCOL1A2 30BYDV 31 CoV-2-TTG-dSL5 32 P4_mActb 33 P4_mActb_inv 34 CoV-2-TTG-dSL5A-C35 CoV-2-TTG-dSL1-3 36 mRpl18a_P4_mActb 37 TEV_P4_mActb 38 P4_TEV_mActb39 CoV-2-TTG-dSL4-full 40 CoV-2-TTG-dSL4-1 41 CoV-2-TTG-dSL5A 42CoV-2-TTG-TTGfull-dSL1-3 43 CoV-2-TTG-dSL5B, C 44 CoV-2-TTG-dSL1 45CoV-2-TTG-dSL4-2 46 CoV-2-TTG-dSL2 47 CoV-2-TTG-dSL3 48 CoV2 49 CoV2_TTG50 CoV2_P4 51 CoV-2-TTG-TTGfull 52 mHoxa9_IRES 53 HCV_IRES 54 RBCS1A 55

TABLE 2 3′UTR Sequences Name: SEQ ID NO: SINV_URE 56 CYBA 57 PV 58 hHBA159 CYBA_1.5x 60 BMV 61 hHBB 62 AMV 63 ENE_Wilusz 64 ENE_Weissman 65 BYDV66 TSV 67 hHBB_F30Pepper 68 P4P6 69 TCV 70 hHBBx2 71 CrPV 72 SINV 73CoV2 74 DV 75 RV 76 WPRE 77 hActb 78 mActb 79 hCOL1A2 80

FIG. 1B illustrates a method 100 for iteratively selecting elements toincrease RNA translatability and/or stability. Such embodiments identifysequences or segments of an expression-affecting region (e.g., 5′UTR,3′UTR, and/or coding region) that increase translatability, stability,and/or other property of the RNA molecule.

At 102, various embodiments obtain a library of RNA molecules. Incertain embodiments, the library comprises RNA molecules with degeneratesequences in regions that affect RNA expression. In certain embodiments,the degenerate expression-affecting region are truncated at its 5′-and/or 3′-end. In some embodiments, the degenerate expression-affectingregion contains internal deletions, such that the 5′- and/or 3′-endremain intact, but the overall region is smaller. In certainembodiments, the degenerate sequences are random, ambiguous, and/ormutated sequences to identify specific bases that may allow for anoutsized role in translatability and/or stability.

At 104, many embodiments assess stability and/or translatability of themolecules. Various methods to assess translatability and/or stabilityare described herein. At 106, certain embodiments select for moleculeshaving a minimum level of translatability and/or stability, such asthrough selection of a specific fraction of stability and/ortranslatability. For example, many embodiments select for fractionshaving high levels of stability (e.g., at longer time points) and/ortranslatability (e.g., higher polysome fractions).

Upon assessing stability and/or translatability, certain embodimentssequence the selected for molecules at 108. Sequencing the selectedmolecules identifies the specific sequences that correlate to the testedcharacteristic (e.g., translatability and/or stability).

It should be noted that many embodiments may perform several featuresmultiple times, such as the assessing 104 and selecting 106 features, inorder to identify the sequences having the highest rates oftranslatability and/or stability. For example, numerous embodiments takethe selected for molecules (e.g., ones having high levels oftranslatability and/or stability) and reassess the translatabilityand/or stability of these molecules, selecting for high levels of highlevels of translatability and/or stability. Various embodiments repeatthe assessing 104 and selecting 106 features 2 times, 3 times, 4 times 5times, 6 times, 7 times, 8 times, 9 times, 10 times, or more times toidentify molecules having the highest levels of translatability and/orstability. Additionally, various embodiments repeat sequencing 108, suchas after each selection 106, or just after every second selection 106.

Methods, such as method 100, allow for iterative selection ofexpression-affecting regions to further increase translatability,stability, and/or other property. Some exemplary embodiments utilize apool 5′-UTR selected from SEQ ID NOs: 81-111. Table 3 identifiesspecific pools of 5′UTR sequences for iterative selection.

TABLE 3 5′UTR Sequence Pools Pool 1 SEQ ID NOs: 81-90 Pool 2 SEQ ID NOs:91-101 Pool 3 SEQ ID NOs: 102-106 Pool 4 SEQ ID NOs: 107-111

Returning to FIG. 1A, many embodiments of an RNA molecule possess acoding sequence, or CDS, located 3′ from the 5′UTR, and 5′ of the 3′UTR.In many embodiments, the CDS begins (e.g., at its 5′-end) with a startcodon (e.g., the canonical AUG and/or any other codon known to begintranslation). In many embodiments, the CDS terminates (e.g., at its3′-end) with a stop codon. In various embodiments the stop codon is acanonical stop codon (e.g., UAG, UAA, UGA), while further embodimentscomprise a noncanonical stop codon or another sequence shown toterminate translation. Certain embodiments comprise more than one stopcodon in the CDS.

The coding sequence is a designed sequence of interest to encode aprotein or peptide of interest. In certain embodiments, the codingsequence encodes an epitope or other antigen to induce an immuneresponse, thus allowing creation of a vaccine. In various embodiments,the protein or peptide of interest is used as a therapeutic directly,such that the protein or peptide of interest replaces or supplements adysfunctional protein or peptide. In some embodiments, the protein orpeptide of interest corrects for dysfunction of another protein orpeptide. While protein coding sequences are described in the context ofthis exemplary embodiment, additional embodiments possess sequences fornon-coding RNAs, such as RNAs that guide genome editing and/or coatchromatin. Various embodiments possess a CDS encoding a reporter gene;for example, nanoluciferase (“Nluc”, SEQ ID NO: 112), green fluorescenceprotein (“GFP”, SEQ ID NO: 113), and/or any other reporter gene ofinterest. Various embodiments encode a therapeutic, such as amulti-epitope vaccine (“MEV”, SEQ ID NO: 114).

Additional embodiments of an RNA molecule include a barcode to identifyparticular molecules based on unique sequences. Many barcode schemes areknown in the art and range from 2 to 12 or more nucleotides. In manyembodiments, the barcodes are 6-9 nucleotides in length. Certainembodiments select one or more barcodes from SEQ ID NOs: 115-1380.

To read barcodes, an RNA molecule can include one or more profilingsequences that can be used by PCR primers or sequencing primers toamplify and/or sequence the barcode region. In some embodimentsprofiling sequences are located at the 5′ and/or 3′ end of a barcode. Inmany embodiments, profiling sequences flank the barcode. In variousembodiments profiling sequences are selected from profiling sequence 1(SEQ ID NO: 1381) and profiling sequence 2 (SEQ ID NO: 1382).

As noted above, some embodiments of an RNA molecule possess a tailingsequence located at the 3′ end of a molecule. In various embodiments thetailing sequence is used to add a poly-A tail or other structuralsequence to an RNA molecule. In some embodiments, the tailing sequenceis selected as SEQ ID NO: 1383.

Structures, such as those described above in regard to FIG. 1 allow formodular and combinatorial testing of various 5′UTRs, CDSs, and 3′UTRs.

Methods of Assessing RNA Translatability

Certain embodiments assess translatability of RNA molecules, such asthose described elsewhere herein. An exemplary embodiment of a method200 to assess translatability is illustrated in FIG. 2 . In method 200,an RNA molecule is obtained at 202 of many embodiments. In certainembodiments, the RNA molecule is generated via in vitro transcription.Additionally, certain embodiments generate an RNA transcript and/orfurther modify RNA transcripts to be used for translation (e.g.,including a 5′ cap and/or a 3′ polyA tail). Some embodiments obtain DNAtemplates from a commercial vendor. In various embodiments, polymerasechain reaction (PCR) is used to amplify a full-length DNA template forthe RNA molecule. Additional embodiments assess amplicon quality viaelectrophoresis, including gel (agarose and/or polyacrylamide) and/orcapillary electrophoresis (e.g., ABI 3700 and/or Agilent Bioanalyzer).Further embodiments transcribe DNA amplicons to RNA using aDNA-dependent RNA polymerase. Certain embodiments perform the in vitrotranscription using commercial kits, including Thermo's T7 MEGAScriptkit. Various embodiments modify the RNA transcripts with a 5′ cap and/orpolyA tail. These modifications can be accomplished using kits, such asthe Cellscript kit and/or any other applicable and commerciallyavailable kit. Additional cleanups can be accomplished at various stages(e.g., after PCR, after transcription, and/or after modification), usingcolumns or reagents, such as Thermo's MEGAClear columns. And, quality ofthe transcribed and/or modified RNAs can be accomplished viaelectrophoresis, including gel and capillary electrophoresis. Furtherembodiments quantify the RNA pool via various known means, such asspectrophotometry, fluorometry, or and/or any other known method forquantifying nucleic acids.

In various embodiments, the RNA molecule is obtained as a pool of RNAmolecules, where each unique RNA sequence in the pool comprises a uniquebarcode, such as described herein. In certain embodiments, the RNAmolecules within the pool are approximately the same length. In certainembodiments, the RNA molecules within the pool vary in length.

Various embodiments transfect RNA transcripts into cells or add thetranscripts to a cellular lysate at 204. In certain embodiments,transfection occurs on cultured cells or tissue, including mammaliancells, while other embodiments use yeast, bacteria, or plant cells. Somespecific embodiments transfect HEK293T cells. Various embodimentsincubate the transfected cells to allow for translation of the RNAs.Incubation can last between 1 hour and several days (e.g., 7-10 days) attemperatures and/or conditions to encourage cellular growth andtranslation. Culture media can include antibiotics or other selectivereagents to prevent growth of non-transfected cells and/orcontamination. Certain embodiments utilize a cellular lysate as a proxyof in vivo stress on RNA. In such embodiments, cultured cells are lysedvia a known method, such as sonication, hydrodynamic stress, or anyother method to generate cellular lysate. In various embodiments, theRNA molecule(s) are added to the lysate and allowed to react for aperiod of time, such as between 1 hour and several days (e.g., 7-10days) and at temperatures commensurate with the operating temperaturefor the RNA (e.g., average body temperature, 37° C.).

At 206, certain embodiments perform polysome profiling. In variousembodiments, the polysome profiling separates RNA molecules ortranscripts based on the number of ribosomes located on, or bound to, atranscript or RNA molecule. As ribosomes are the machinery fortranslation, the number of ribosomes located on a transcript isindicative of the translatability of a particular transcript.

In certain embodiments, polysome profiling uses a sucrose gradient(e.g., a continuous sucrose gradient) to fractionate RNA molecules basedon the number of ribosomes (e.g., polysomes) located on the transcript.Various embodiments perform polysome profiling by lysing transfectedcells and applying the lysate to a column containing a sucrose gradient.In embodiments, where RNA transcripts are applied to a cellular lysate,the lysate is directly added to a sucrose gradient column.Centrifugation is applied to the column to separate transcripts based onthe number of ribosomes attached to a transcript.

At 208, many embodiments isolate or extract one or more fractions of RNAmolecules from the polysome profile. In certain embodiments, thefractions or isolated from the sucrose gradient. In various embodiments,the fractions are isolated as slices, drops, and/or other method ofobtaining a fraction from a sucrose gradient. Actively translating RNAmolecules have a higher number of ribosomes associated with them and arefound in polysomal fractions (e.g., more ribosomes bound to the RNAmolecule) whereas non-translating/poorly-translating RNA molecules arepresent in a free RNA fraction or associated with ribosomal subunits(e.g., 40S ribosomal subunit). In certain embodiments, fractionsrepresenting higher amounts of ribosomes bound to RNA are isolated,while some embodiments isolate fractions representing a range ofribosomes bound to RNA in order to identify a distribution of ribosomespresent for a particular transcript sequence. RNA molecules from anisolated fraction can be cleaned up via known procedures or kits,including columns.

Certain embodiments introduce a known amount of one or more RNAmolecules as a spike-in. Spike-ins serve as an internal reference toallow for quantification of molecules within the assessed RNA library.Such spike-ins are unique RNA molecules that are not present in theanalyzed RNA library. The spike-ins can be similar in length to themolecules in the library, and/or possess unique sequences or barcodes.

Various embodiments identify the RNA molecules located in the one ormore fractions based on their barcodes at 210. As noted above inrelation to FIG. 1 , many embodiments of RNA molecules contain a barcodesequence (e.g., SEQ ID NOs: 115-1360). The profiling sequences flankingthe barcodes (e.g., SEQ ID NOs: 1381-1382) can be used to amplify thebarcode or can be used as sequencing primers for barcoding reads of theRNA molecules of certain embodiments. Further embodiments utilizehybridization probes, quantitative PCR (qPCR), or any other known methodwith or without pooling strategies to identify which RNAs are present ineach fraction.

Methods of Assessing In Vivo or In-Cell RNA Stability

Certain embodiments assess the stability of RNA molecules, includingstability within in vivo and in vitro environments. An exemplaryembodiment of a method 300 to assess stability is illustrated in FIG. 3. In method 300, RNA is obtained at 302. Obtaining RNA at 302 can beaccomplished via many methods, including such steps as described inregard to method 200 (FIG. 2 ), including the obtention of a pool of RNAmolecules, where each unique RNA sequence is identifiable by a uniquebarcode.

Various embodiments transfect RNA transcripts into cells or add thetranscripts to a cellular lysate at 304. In certain embodiments,transfection occurs on cultured cells or tissue, including mammaliancells, while other embodiments use yeast, bacteria, or plant cells. Somespecific embodiments transfect HEK293T cells. Various embodimentsincubate the transfected cells. Incubation can last between 1 hour andseveral days (e.g., 7-10 days) at temperatures and/or conditions toencourage cellular growth. Culture media can include antibiotics orother selective reagents to prevent growth of non-transfected cellsand/or contamination. Certain embodiments utilize a cellular lysate as aproxy of in vivo stress on RNA. In such embodiments, cultured cells arelysed via a known method, such as sonication, hydrodynamic stress, orany other method to generate cellular lysate. Then, the RNAs are addedto the lysate and allowed to react for a period of time, such as between1 hour and several days (e.g., 7-10 days) and at temperaturescommensurate with the operating temperature for the RNA (e.g., averagebody temperature, 37° C.).

At 306, certain embodiments isolate RNAs based on in-cell stability. Invarious embodiments, RNAs are isolated from transfected cells, whilesome embodiments isolate the RNAs from a cellular lysate. Certainembodiments isolate RNA from transfected cells at various time points(e.g., after 1 hour, 2 hours, 3 hours, 6 hours, 12 hours, 24 hours,etc.) to create time-based fractions of RNAs. Based on the relativeamounts of an RNA at the different timepoints, assessment of RNAstability can be derived, and a RNA half-life can be calculated.Additionally, isolated RNA molecules can be cleaned up via knownprocedures or kits, including isolation protocols, kits, columns, or anyother known method for isolating RNA from cells or a lysate.

Some embodiments select for stable RNAs by performing reversetranscription PCR (RT-PCR) to amplify long, full length RNA regions, forexample the full-length mRNA, full-length CDS, 5′ UTR-CDS, 3′ UTR-CDS,or any other length covering functional region, or only the barcoderegion, into complimentary DNA (cDNA). By creating cDNAs, downstreamamplifications can utilize DNA-dependent polymerases to createsequencing libraries or other molecules for analysis. Such embodimentsselect for full length or any longer functional length of RNAs ratherthan RNAs that may have been hydrolyzed but may still be of sufficientlength that electrophoresis or other methods do not remove them.

Certain embodiments introduce a known amount of one or more RNAmolecules as a spike-in. Spike-ins serve as an internal reference toallow for quantification of molecules within the assessed RNA library.Such spike-ins are unique RNA molecules that are not present in theanalyzed RNA library. The spike-ins can be similar in length to themolecules in the library, and/or possess unique sequences or barcodes.

Various embodiments identify the RNAs based on their barcodes at 308. Asnoted above in relation to FIG. 2 , many embodiments of RNA moleculescontain a barcode sequence (e.g., SEQ ID NOs: 115-1380). The profilingsequences flanking the barcodes (e.g., SEQ ID NOs: 1381-1382) can beused to amplify the barcode or can be used as sequencing primers forbarcoding reads of the RNA molecules of certain embodiments. Furtherembodiments utilize hybridization probes, quantitative PCR (qPCR), orany other known method with or without pooling strategies to identifywhich RNAs are present in timepoint based fractions.

Determination of In Vitro or in Solution RNA Stability

An additional challenge for RNA therapeutics, including vaccines,include the stability in storage, such as between manufacture and actualtreatment or delivery to an individual. Such stability is referred to asin vitro stability, as it emphasizes stability in non-biologicalenvironments, such as in vials, syringes, or other method of storage.Various embodiments provide a method to measure in vitro stability ofRNAs. Turning to FIG. 4 , a method 400 to determine in vitro RNAstability in accordance with various embodiments is illustrated. Withinmethod 400, RNA is obtained at 402. Obtaining RNA at 402 can beaccomplished via many methods, including such steps as described inregard to method 200 (FIG. 2 ), including the obtention of a pool of RNAmolecules, where each unique RNA sequence is identifiable by a uniquebarcode.

At 404 of many embodiments, the RNA pool is treated or subjected to anexperimental condition. The experimental conditions include anycondition that may cause degradation of an RNA molecule in a storagesituation, including (but not limited to) temperature, pH, presence ofcertain molecules and/or ions, concentration of certain molecules and/orions, irradiation, time, buffer type, buffer concentration, and/or anyother condition that can affect RNA stability. Such conditions are meantto reproduce actual conditions that can induce one or more hydrolyticevents within the RNA molecules. A hydrolytic event, in accordance withvarious embodiments, causes a break within the RNA molecule, resultingin a broken or incomplete RNA molecule. Incomplete or broken RNAmolecules may be insufficient for use as a therapeutic, as they may beprone to degradation or ineffective in protein production, thusincomplete or broken RNA molecules may limit the efficacy of themolecule as a therapeutic.

Further embodiments further select for stable RNAs in the pool at 406.In some embodiments, the selection occurs by size selecting for fulllength RNAs, such as through electrophoresis, including (but not limitedto) agarose gel electrophoresis, polyacrylamide electrophoresis, andcapillary electrophoresis.

Some embodiments select for stable RNAs by performing reversetranscription PCR (RT-PCR) to amplify long RNA regions, for example thefull-length mRNA, full-length CDS, 5′ UTR-CDS, 3′ UTR-CDS, or any otherlength covering functional region, or only the barcode region, intocomplimentary DNA (cDNA). By creating cDNAs, downstream amplificationscan utilize DNA-dependent polymerases to create sequencing libraries orother molecules for analysis. Such embodiments select for full length orany longer functional length of RNAs rather than RNAs that may have beenhydrolyzed but may still be of sufficient length that electrophoresis orother methods do not remove them.

Certain embodiments introduce a known amount of one or more RNAmolecules as a spike-in. Spike-ins serve as an internal reference toallow for quantification of molecules within the assessed RNA library.Such spike-ins are unique RNA molecules that are not present in theanalyzed RNA library. The spike-ins can be similar in length to themolecules in the library, and/or possess unique sequences or barcodes.

Many embodiments isolate RNAs based on in vitro or in solution stabilityat 408. Certain embodiments isolate RNA from a solution at various timepoints (e.g., after 1 hour, 2 hours, 3 hours, 6 hours, 12 hours, 24hours, etc.) to create time-based fractions of RNAs from a solution.Based on the amount of an RNA at the timepoint 0, relative assessment ofRNA stability can be derived, and a RNA half-life can be calculated.Additionally, isolated RNA molecules can be cleaned up via knownprocedures or kits, including isolation protocols, kits, columns, or anyother know method for isolating RNA from cells or a lysate.

At 410, stable RNAs are identified. In various embodiments, theundigested or gel-extracted RNAs are sequenced using the barcode toidentify the particular molecules that are stable. In many embodiments,cDNAs created in 406 are utilized as templates to create a sequencinglibrary to avoid the amplification of RNAs that may be near full length.

Identifying RNAs Having Enhanced Translatability, Stability, and/orOther Property

Turning to FIG. 5A, certain embodiments are capable of simultaneouslyassessing one or more of translatability, stability, and/or any otherproperty. Such embodiments assess one or more of translatability, invivo (or in cell) stability, in vitro (or in solution) stability, and/orany other property. Within method 500, RNA is obtained at 502. ObtainingRNA at 502 can be accomplished via many methods, including such steps asdescribed in regard to method 200 (FIG. 2 ), including the obtention ofa pool of RNA molecules, where each unique RNA sequence is identifiableby a unique barcode. Many embodiments perform one or more of assessingtranslatability 504, assessing in vivo (or in cell) stability 506,and/or assessing in vitro (or in solution) stability 508. Assessingtranslatability 504 can be performed via methods, such as method 200(FIG. 2 ), while in vivo stability 506 can be performed via method 300(FIG. 3 ), and assessing in vitro stability can be performed via method400 (FIG. 4 ). Upon obtaining fractions from the one or more ofassessing translatability 504, assessing in vivo stability 506, and/orassessing in vitro stability 508, various embodiments can identify RNAsat 510.

Turning to FIG. 5B, various embodiments identify RNA moleculespossessing increased translatability in method 550. At 552, manyembodiments obtain identities of RNA molecules present in variousfractions of translatability (e.g., RNAs assessed via methods 200, 300,400, and/or 500). In various embodiments, these identities include thebarcode or barcodes that identify each of the RNA molecules in afraction and a read count of each barcode in each fraction.

At 554, various embodiments determine the translatability of each RNAmolecule by identifying prevalence of each barcode in each fraction.Certain embodiments perform statistical analyses to relative prevalenceof the barcode in each fraction. The presence of RNAs in fractionscorrelating to more ribosomes, indicate increased translatability ofthat particular RNA molecule as compared to other fractions across thewhole polysome profile gradient.

Some embodiments filter RNA molecules based on particularcharacteristics at 556. Particular characteristics may be specificcutoffs, minimum levels of translatability, or a statisticaldistribution of a particular barcode. For example, certain embodimentsmay select barcodes that have a narrower distribution with a loweraverage ribosomal load (e.g., fewer ribosomes on RNA molecules), whileother embodiments may select for a higher average with a broader overalldistribution.

Various embodiments deconvolve the barcodes at 558, where deconvolutioninvolves matching the specific RNA sequence or sequence name with thebarcode sequence comprised within that RNA molecule.

Additional embodiments output results of translatability, stability,and/or other property at 560. Certain embodiments provide lists of eachof the sequences providing a specific cutoff or parameter for minimumtranslatability, stability, and/or other property. Various embodimentsproduce a graphical display or visualization, such as a dot plot, heatmap, or other graph or chart to visualize stability (e.g., in vivo, invitro, in cell, in solution, etc.), translatability, and/or any otherproperty of a particular RNA molecule.

Additional embodiments output results of predicted protein expression ata given time or total protein expression over time, from experimentallydetermined stability and translatability. For this, additionalembodiments can use modelling of the empirical data to estimate thepredicted protein expression in a pool of hundreds of different RNAmolecules based on measurements of a selected number of RNA designs.

Enhanced Translatability of RNA Molecules

Turning to FIGS. 6A-6C, exemplary results of embodiments showingtranslation efficiency are illustrated, where FIG. 6A illustrates aheatmap and FIGS. 6B-6C show enlarged portions of FIG. 6A. FIGS. 6A-6Cillustrate the relative prevalence of 64 unique RNA molecules inaccordance with various embodiments based on polysome fraction. Darkercells indicate a lower relative prevalence of the molecule in aparticular fraction, while lighter colors indicate a higher relativeprevalence of the molecule in a particular fraction.

Additionally, FIGS. 7A-7C illustrate exemplary data plotting ribosomalload (FIG. 7A), half-lives for in-cell (or in vivo) stability (FIG. 7B)and in solution (or in vitro) stability (FIG. 7C) of various mRNAmolecules, including mRNA molecules having 5′UTR variants, 3′UTRvariants, both 5′UTR and 3′UTR variants, and various CDS sequences,including from Nluc, eGFP, and MEV. For ribosomal load in FIG. 7A,ribosomal load is determined by the equation listed in FIG. 7A.

FIGS. 8A-8C illustrate exemplary data showing correlations of in cellmRNA half-life to ribosomal load (FIG. 8A), in cell mRNA half-life tomonosome-to-free-subunit ratio (FIG. 8B), and in cell mRNA half-life topolysome-to-monosome ratio (FIG. 8C).

Given the assessment of in cell stability and translatability inaccordance with various embodiments, further embodiments determineprotein expression levels of proteins encoded in a CDS of the molecule.Certain embodiments determine protein expression via the equation:

${ {P(t)} \sim k_{t}}\frac{e^{- k_{p}t} - e^{- k_{m}t}}{k_{m} - k_{p}}$

Where P(t) is protein quantity at time t; k_(t) is translation rate; andk_(m) and k_(p) are rates of mRNA and protein decay, respectively.

An exemplary demonstration of predicted expression is illustrated inFIGS. 9A-9C, where FIG. 9A illustrates in cell half-life of various mRNAconstructs, FIG. 9B illustrates ribosomal load of various constructs,and FIG. 9C illustrates the predicted expression.

EXEMPLARY EMBODIMENTS

Although the following embodiments provide details on certainembodiments of the inventions, it should be understood that these areonly exemplary in nature, and are not intended to limit the scope of theinvention.

Example 1: In Vitro Transcription of Reporter mRNAs

Method: Preparation of mRNAs were based on in vitro transcription fromDNA templates. DNA templates were amplified by PCR using AccuPrime Pfx(Life Technologies, 12344024) and purified using the Monarch PCR & DNACleanup Kit (NEB, T1030L). The source of the 3×HA-Nluc starting CDS(“Nluc start”) is derived from the pcDNA3.1-5′UTR-3×HA-Nluc plasmidencoding the HA-tagged Nanoluc CDS. Individual template DNA or the233-mRNA library was amplified from linear DNA synthesized on a BioXP3200 system (Codex DNA) or by Twist Bioscience, using the fixed forward(T7_F_28nt) and reverse (const3_R) primer. The forward primer binds tothe T7 RNA polymerase promoter common in DNA template for all mRNAdesigns; the reverse primer is complementary to a common “const3” regionat the end of all tested mRNA 3′ UTRs. For the IVT template pool,individual DNA templates were pooled for a template pool of hundreds ofconstructs at an equimolar concentration and are amplified with outerprimers in a pooled format. For the pooled template, 1 μL of eachconstruct (˜20 ng/μL stock concentration) was pooled to be used as thePCR template. The Pfx PCR contained the following: 2.5 μL 10×Pfx buffer,0.25 μL forward primer (100 uM), 0.25 μL reverse primer (100 uM), 0.75μL DMSO (NEB), 0.25 μL Pfx Polymerase (Thermo), 20.5 water, and 0.5 μLtemplate DNA (˜20-50 ng/ul), in a total 25 μL reaction with thefollowing program: 2 min at 95° C.; 10 sec at 95° C.; 30 sec at 58° C.;30 s or 1 min at 68° C.; cycled 9×; final extension of 5 min at 68° C.PCR reactions were purified with Monarch PCR & DNA Cleanup Kit (NEB,T1030L). For the hHBB-Fluc control mRNA, the DNA template was amplifiedfrom the pGL3-HBB plasmid using the primers KL588/KL589 which yielded aPCR product of 1,750 kb in length. For cloning the MALAT1 ENE 3′ UTRstem-loop, we first amplified the ENE region using primers ENE-1/ENE-2with flanking constant regions. The resulting amplicon was assembledwith a hHBB-Nluc sequence that lacked a 3′ UTR but maintained a uniquebarcode using a NEBuilder HiFi Assembly Kit (NEB, ES2621).

In vitro transcription was performed with the MEGAscript T7 kit (Ambion,AM1333) according to the manufacturer's instructions. A 20 μLtranscription reaction contained max. 5 μg linear DNA template, 4 mM ofeach NTP (Ambion), 2 μL/200 U MEGAscript T7 RNA polymerase (Ambion) and1×T7 MEGAscript Transcription Buffer (Ambion). After a total incubationfor 3 hours at 37° C., the DNA was digested by addition of 1 μL/2 UTurbo DNase (Ambion, AM2238) for 15 min at 37° C. For pseudouridylatedmRNAs, pseudouridine triphosphate (Trilink Biotechnologies, N1019-5) wassubstituted for uridine triphosphate at an equivalent concentration.mRNA was purified using MegaClear columns (Thermo Scientific, Ambion,AM1908). A 20 μL reaction usually yielded 100-150 μg of RNA.

For mRNA transfection of HEK293T cells, m⁷G-capped and polyadenylatedmRNAs were generated as follows. In vitro transcribed mRNA was thenm⁷G-capped and polyadenylated using the ScriptCap m7G Capping System(CellScript, C-SCCE0625) and A-Plus Poly(A) Polymerase Tailing Kit(CellScript, C-PAP5104H), respectively, according to the manufacturer'sinstruction with the following modifications. Aliquots of 30 μg of eachRNA were processed in parallel, diluted to 34.25 μL in water and heatedfor 5 min at 65° C. to denature and placed on ice. The 50 μL cappingreaction contained 5 μL 10× ScriptCap buffer (Cellscript), 5 μL 10 mMGTP (Cellscript), 2.5 μL 2 mM S-adenosyl-methionine (SAM, 20 mM stock,Cellscript), 1.25 μL ScriptGuard RNase Inhibitor (Cellscript), and 2 μLCapping enzyme (20 U, Cellscript, 10 U/μL). For the capping step, the37° C. incubation was performed for 1 hour and the capped RNA was placedon ice. Polyadenylation was performed from the resulting RNAs withoutpurification in between. The polyA reaction contained 30 μg of cappedmRNA in 50 μL, 6.6 μL 10× A-Plus polyA tailing buffer (Cellscript), 6.6μL 10 mM ATP (Cellscript), 0.3 μL ScriptGuard RNase Inhibitor(Cellscript), and 2.5 μL A-Plus PolyA Polymerase (10 U, 4 U/μL,Cellscript) in a total reaction volume of 66 μL. We aimed to add a 150nt-long polyA-tail for which we incubated the capped mRNA for 30 min at37° C. with 10 U of polyA enzyme, after which the reaction was placed onice. The mRNA was again purified using MegaClear columns. mRNAconcentration was determined on a Nanodrop 2000 (Thermo Fisher). Thisusually yields 30-40 μg of capped and polyadenylated mRNA. mRNA qualitywas determined by 4% urea-PAGE, 1% formaldehyde agarose gel or capillaryelectrophoresis with an Agilent 2100 Bioanalyzer (Agilent Technologies).

Example 2: Cell Culture and Transfections

Method: HEK293T (ATCC: CRL-3216) cells were cultured in Dulbecco'sModified Eagle's Medium (DMEM, Gibco, 11965-118) containing 2 mML-glutamine, supplemented with 10% fetal bovine serum (EMD Millipore,TMS-013-B), 100 U/ml penicillin and 0.1 mg/ml streptomycin (EmbryoMax ESCell Qualified Penicillin-Streptomycin Solution 100×; EMD Millipore,TMS-AB2-C or Gibco, 15140-122) at 37° C. in 5% CO₂-buffered incubators.For transfection of pooled 5‘ m’G-capped and poly(A)-tailed RNAs,5.0×10⁶ HEK293T cells were seeded in a 10 cm plate 24 h beforetransfection. 10 μg of pooled RNAs were transfected using LipofectamineMessengerMax as per manufacturer's instructions (Life Technologies).Media was changed 3 h after transfection and replaced with complete DMEMsupplemented with 10% FBS and Pen/Strep. For transfections of individualm⁷G-capped RNAs, 3.0×10⁴ HEK293T cells were seeded per well 24 h beforetransfection in a 96-well plate. Subsequently, 10 ng of Nluc RNA wasco-transfected with 20 ng of m⁷G-capped HBB-Fluc control RNA usingLipofectamine MessengerMax as per manufacturer's instructions (LifeTechnologies).

Example 3: Sucrose Gradient Fractionation Analysis

Method: Cell culture media was replaced with cycloheximide(MilliporeSigma, C7698-1G) containing media at 100 ug/mL. After 2minutes, cells were washed, trypsinized and harvested using PBS,trypsin, and culture media containing 100 g/mL cycloheximide. ˜10×10⁶cells were resuspended in 400 μL of following lysis buffer on ice for 30min, vortexing every 10 min: 25 mM Tris-HCl pH 7.5, 150 mM NaCl, 15 mMMgCI2, 1 mM DTT, 8% glycerol, 1% Triton X-100, 100 μg/mL cycloheximide,0.2 U/μL Superase-In RNase inhibitor (ThermoFisher Scientific, AM2694),1× Halt protease inhibitor cocktail (ThermoFisher Scientific, 78430),0.02 U/μL TURBO DNase (ThermoFisher Scientific, AM2238). After lysis,nuclei were removed by two step centrifuging, first at 1300 g for 5 minand second at 10000 g for 5 min, taking the supernatants from each.25%-50% sucrose gradient was prepared in 13.2 mL ultracentrifuge tubes(Beckman Coulter, 331372) using Biocomp Gradient Master with thefollowing recipe: 25 or 50% sucrose (w/v), 25 mM Tris-HCl pH 7.5, 150 mMNaCl, 15 mM MgCI2, 1 mM DTT, 100 μg/mL cycloheximide. The lysate waslayered onto the sucrose gradient and ultracentrifuged on BeckmanCoulter SW-41Ti rotor at 40000 rpm for 150 min at 4° C. The gradient wasdensity fractionated using Brandel BR-188 into 16×750 μL fractions, andin vitro transcribed spike-in RNA mix (120002B1, 120010B1, 220023B1,310333T3; 1000, 100, 10, 1-fold dilutions respectively) were added toeach fraction. 700 μL of each fraction was mixed with 100 μL 10% SDS,200 μL 1.5 M sodium acetate, and 900 μL acid phenol-chloroform, pH 4.5(ThermoFisher Scientific, AM9720), heated at 65° C. for 5 min, andcentrifuged at 20.000 g for 15 min at 4° C. for phase separation. 600 μLaqueous phase was mixed with 600 μL 100% ethanol and RNA was purified onsilica columns (Zymo, R1013).

Example 4: Polysome Selection and Library Preparation

Method: The variant 5′ UTR is composed of: fixed first 29 nt of hHBB,variable 35 nt (initially degenerate) and 6 nt Kozak consensus. Togenerate the reporter mRNA pool containing the variant 5′ UTR library,IVT template was first assembled by PCR under the following conditions:4 μL 10× AccuPrime Pfx Reaction Mix, 0.4 pmol HBB29_N35 amplicon, 0.4pmol Nluc_HBB_3UTR, 0.4 μL AccuPrime Pfx Polymerase in 40 μL of totalreaction volume. Cycling conditions are: 95° C. for 120 sec, and 19cycles of 95° C. for 15 sec, 66° C. for 30 sec, 68° C. for 75 sec. PCRproduct was purified on silica columns (NEB T1034) and amplified withunder the following conditions: 4 uL 10× AccuPrime Pfx Reaction Mix, 4μL 10 μM T7_28_HBB_30_F, 4 μL 10 μM Nanoluc_ORF_R, 0.4 μL AccuPrime PfxPolymerase in 40 μL total reaction volume. Cycling conditions are: 95°C. for 120 sec, and 4 cycles of 95° C. for 15 sec, 66° C. for 30 sec,68° C. for 75 sec. The mRNA was in vitro transcribed, capped andpolyadenylated as described above. This yields an estimated initialstarting degenerate pool complexity of ˜2.4×10¹¹.

Transfection of HEK-293 cells and sucrose gradient fractionation wereperformed as described above. Equal volumes of fractions 10-16 werepooled and RNA was by acidic phenol chloroform extraction followed bycolumn purification (Zymo Research, R1013) as described above. 1/3lysate volume was kept as input before layering onto the sucrosegradient and RNA was extracted from the input lysate by Trizolextraction followed by column purification. 1.5 μg RNA in 5.5 μL wasmixed with 0.5 μL 2 uM RT_Nluc26_UM112_Read1Partial and 0.5 μL 10 mMdNTPs each. The RNA samples were then denatured at 65° C. for 5 min andchilled to 4° C. 3.5 μL reverse transcription mix was added to 10 μLtotal reaction volume: 2 μL 5× Superscript IV buffer, 0.5 μL 10 mM DTT,0.5 μL Superase-In (ThermoFisher Scientific, AM2694), 0.5 μL SuperscriptIV (Thermo 18091050). The reaction was incubated at 55° C. for 45 minand inactivated at 80° C. for 10 min. Variant 5′ UTR amplicon wasamplified from the reverse transcription reaction via PCR under thefollowing reaction conditions: 4 μL RT reaction, 40 μL 2× Q5 Hot StartMaster Mix (NEB M0494S), 0.8 μL 100×SYBR (Thermo S7563), 4 μL 10 μMT7_28_HBB_29_F, 4 μL 10 μM Nanoluc_ORF_R, in 80 μL total reactionvolume. Cycling conditions were as follows: 98° C. for 60 sec, and 15cycles of 98° C. for 10 sec, 68° C. for 10 sec, 72° C. for 10 sec. PCRproduct was purified on silica columns (NEB T1034) and assembly withNluc_HBB_3UTR fragment was performed as described above for initialpreparation of IVT template using HBB29_N35 amplicon. The mRNA was invitro transcribed, capped and polyadenylated as described above. Thesame process of transfection, fractionation, reverse transcription, PCRamplification, assembly and in vitro transcription was repeated.

For sequencing library preparation, the RT reaction was PCR amplifiedunder the following conditions: 1 μL RT reaction, 10 μL 2× Q5 Hot StartMaster Mix (NEB M0494S), 0.2 μL 100×SYBR (Thermo S7563), 1 μL 10 μMRead1, 1 μL 10 μM Read2Partial_HBB29 in 20 μL total reaction volume.Cycling conditions were as follows: 98° C. for 60 sec, and 15 cycles of98° C. for 10 sec, 68° C. for 10 sec, 72° C. for 10 sec. Sequencingadaptors were added using the following conditions for final round PCR:1 μL first round PCR reaction, 10 μL 2× Q5 Hot Start Master Mix, 0.2 μL100×SYBR, 1 μL 10 uM NEBNext Index Primer (NEB E7335, NEB E7500, NEBE7710, NEB E7730, NEB E6609), 1 μL 10 uM NEBNext Universal PCR Primer in20 μL total volume. Cycling conditions are: 98° C. for 60 sec, and 5cycles of 98° C. for 10 sec, 72° C. for 10 sec. All barcoded sampleswere then pooled at equal volumes and purified with 1.1×SPRiselect beadsBeckman Coulter B23317). Sequencing was performed at the StanfordFunctional Genomics Facility (SFGF) at Stanford University, on theIllumina NextSeq 550 instrument, using a high output kit, 1×81 cycles.

Example 5: In Cell and In-Solution RNA Degradation Time Courses

Method: For in-cell RNA stability, the 233-member in vitro transcribedmRNA pool (m⁷G-capped and polyA) was transfected into HEK293T cells asdescribed above and RNA was harvested at 1, 7, 12, and 24 h in Trizol(ThermoFisher Scientific, 15596026). RNA was extracted from the aqueousphase on silica columns (Zymo, R1013).

For in-solution RNA degradation experiments, 750 ng of the 233-mRNA pool(not m⁷G-capped or polyA) was incubated in 30 μL of Degradation Buffer(50 mM CHES at pH 10 and 10 mM MgCI2) and collected over 10 time points:0, 0.5, 1, 2, 3, 4, 5, 6, 16 and 24 h. To each sample, 15 μL of 0.5 MTris-HCl pH 7 and 3 μL of 0.5 M EDTA-Na was added to quench thedegradation. The integrity of each sample was checked by loading 5 μL oftotal RNA alongside a spike-in control (P4P62HP, 50 ng) onto aPAGE-Urea-TBE gel and visualized by SYBR Gold (Thermo Fisher).Subsequently, RNA was purified using Ampure beads +40% polyethyleneglycol 8000 (7:3) and checked again by PAGE-Urea-TBE gel and visualizedby SYBR Gold.

Example 6: Measurement of In-Solution mRNA Stability by CapillaryElectrophoresis

Method: For one-by-one measurement of in-solution mRNA stability, invitro transcribed mRNA was incubated in a degradation buffer over tentime points (0, 0.5, 1.0, 1.5, 2, 3, 4, 5, 18, and 24 hours), thenanalyzed by capillary electrophoresis.

For each time point, 1.6 pmol of mRNA brought to 10 μL in a buffercontaining 50 mM Na-CHES at pH 10 with 10 mM MgCI2, and the reaction wasincubated at 25° C. When the incubation period was reached for each timepoint, 5 μL of Tris-HCl at pH 7 and 1 μL of 500 mM EDTA in nuclease freewater was added to quench the degradation reaction, and frozen forfurther analysis. After the final time point (24 hours), 4 μL of eachmRNA degradation sample (out of a total stored volume of 16 μL) wastaken, and mixed with 1 μL of a control RNA at a concentration of 50ng/μL. For these experiments we opted to use the P4-P6 domain of theTetrahymena ribozyme with two added hairpins (˜239 nt) as a control. TheRNA mixture was then purified using a mixture of AMPure XP beads(Beckman Coulter) with 40% polyethylene glycol (mixed in a 7:3 ratio).The resulting RNA was eluted into 4.5 μL of RNAse-free water foranalysis on the 2100 Bioanalyzer (Agilent) using the RNA-Nano Eukaryoteprotocol.

The data from the Bioanalyzer were analyzed using a custom script thatperforms the following analysis. We first converted elution times tonucleotides based on a ladder control (25, 200, 500, 1000, 2000, and4000 nts). Relative mRNA amounts were estimated based on peak areas atexpected band lengths (for example, ˜900 nucleotides for the mRNAs ofinterest and ˜265 nucleotides for the control). When calculating peakareas, background subtraction was performed, where the background wasdefined as the area under a linear line in the range of nucleotides usedfor the peak area. Normalization was performed using two differentmethods used to cross-validate. First, the peak areas of full-lengthmRNA were normalized to the control P4-P6 domain RNA that was spikedinto the samples after degradation was performed. Second, peak areas offull-length mRNAs were also normalized to the total amount of RNA in thelane less the peak area of the bands of interest (between ˜20-1000nucleotides in our case), assuming that the majority of the other RNA inthe lane were degradation products from the mRNA of interest. Thesedistinct approaches to normalizing the data gave the same results withinestimated error (see below). After calculations of normalized peakareas, fraction intact values were then calculated for each mRNA bydividing the normalized area across the ten timepoints by the normalizedarea at the start (0 hours).

${{Fraction}{Intact}_{i}} = \frac{{Normalized}{Area}_{i}}{{Normalized}{Area}_{0{hours}}}$

For each sample, fraction intact values were fit across the differenttimepoints to an exponential function:

F _(i) =Ae ^(−τ/t)

Where F; is an array of fraction intact values across multiple timepoints, A is the amplitude of the exponential decay function, x is thetime constant, and t is an array of time points in hours. The timeconstant was then used to calculate the in vitro half-life of mRNA:

Half-life=ln(2)τ

Example 7: Library Preparation and Amplicon Sequencinq

Method: Up to 250 ng RNA in 2.75 μL was mixed with 0.25 μL 2 μMRT_Const2_N12_Read1Partial and 0.25 μL 10 mM dNTPs each. The RNA sampleswere then denatured at 65° C. for 5 min and chilled to 4° C. 1.75 μLreverse transcription mix was added to 5 μL total reaction volume: 1 μL5× Superscript IV buffer, 0.25 μL 10 mM DTT, 0.25 μL Superase-In(ThermoFisher Scientific, AM2694), 0.25 μL Superscript IV (Thermo18091050). The reaction was incubated at 55° C. for 45 min andinactivated at 80° C. for 10 min.

First round PCR was performed under following conditions: 1 μL RTreaction, 10 μL 2× Q5 Hot Start Master Mix (NEB M0494S), 0.2 μL 100×SYBR(Thermo S7563), 1 μL 10 uM Read1Partial_F, 1 μL 10 uM 50:50Hbb_Fwd:Nluc_Fwd mix in 20 μL total volume. Cycling conditions were: 98°C. for 60 sec, and 15 cycles of 98° C. for 10 sec, 68° C. for 10 sec and72° C. Second round PCR was performed under the following conditions: 1μL first round PCR, 10 μL 2× Q5 Hot Start Master Mix, 0.2 μL 100×SYBR, 1μL 10 uM Read1Partial_F, 1 μL 10 uM Read2Partial_Const1_R in 20 μL totalvolume. Cycling conditions were: 98° C. for 60 sec, and 5 cycles of 98°C. for 10 sec, 72° C. for 5 sec. Sequencing adaptors were added usingthe following conditions for final round PCR: 1 μL second round PCR, 10μL 2× Q5 Hot Start Master Mix, 0.2 μL 100×SYBR, 1 μL 10 μM NEBNext IndexPrimer (NEB E7335, NEB E7500, NEB E7710, NEB E7730, NEB E6609), 1 μL 10μM NEBNext Universal PCR Primer in 20 μL total volume. Cyclingconditions were: 98° C. for 60 sec, and 5 cycles of 98° C. for 10 sec,72° C. for 5 sec. All barcoded samples were then pooled at equal volumesand purified with 1.1×SPRiselect beads (Beckman Coulter B23317).Sequencing was performed at the Stanford Functional Genomics Facility(SFGF) at Stanford University, on an Illumina NextSeq 550 instrument,using a high output kit, 1×76 cycles. The SEQ ID NOs for the various PCRprimers are listed in Table 4.

TABLE 4 Primer Sequences Name: SEQ ID NO: RT_Const2_N12_Read1Partial1384 Const3_R 1385 Hbb_Fwd 1386 Nluc_Fwd 1387 Read1Partial_F 1388Read2Partial_Const1_R 1389 T7_F_28nt (forward) 1390

Example 8: Amplicon Sequencing Data Analysis

Method: After bcl conversion and demultiplexing with Illumina bcl2fastq,the constant regions were trimmed using cutadapt. The trimmed reads werealigned to the indexed reference of barcode sequences using Bowtie2 withthe following options: -L 11-N 0--nofw. The alignments were deduplicatedbased on UMIs using UMicollapse with -p 0.05 and counted using samtoolsidxstats. This pipeline yields a matrix of barcode read counts whererows are the different constructs in the library and columns are thedifferent samples.

The count matrix was log transformed and normalized column-wise using alinear fit on the dilution series of spike-in constructs in each sample.For the calculation of RNA degradation coefficients in cells, we carriedout a linear fit to log RNA abundance from the time course data, i.e. wefit an expression of Y=β₀+β₁t where Y is the normalized log RNAabundance and t is the number of hours after transfection; pi is thedegradation constant. For the calculation of in solution degradationcoefficients, sufficient data points were available to carry out anonlinear fit directly to an exponential model, i.e. an expression ofy=A exp(−τ/t) was fit, where y is the fraction intact (RNA abundancenormalized to initial abundance), A is the amplitude, t is the time ofincubation in degradation buffer in hours, and T is the degradation timeconstant. Time courses in which the observed fraction intact exceededthe fitted exponential by more than 0.05 in the last time point signaledRT-PCR amplification of misprimed, non-full-length products and werefiltered out of downstream analysis.

For polysome profiles, percent RNA abundances for each fraction werefirst calculated by scaling per-fraction values by the sum of allfractions. For the heatmap displays in the figures, column medians werealso subtracted from each percent RNA value. For the calculation ofribosome load, the matrix of percent RNA abundances in fractions 4-9(1-3 are free RNP fractions, and >9 have negligible abundance) werefirst multiplied by a weight vector representing the number of ribosomesin each fraction as determined by the A260 trace from the fractionator,then the weighted abundances were summed across the row. For thecalculation of polysome to monosome ratio, the sum of fractions 7-9 (>3ribosomes) abundances were divided by fraction 4 (80S) abundance. Forthe calculation of monosome to 40S/60S ratio, fraction 4 (80S) abundancewas divided by the sum of fraction 2 (40S/60S) abundance.

To calculate the expected protein levels assuming first order kineticsof mRNA translation and mRNA/protein decay, the following differentialequations were used:

$\frac{dM}{dt} = {- {k_{m} \cdot {M(t)}}}$$\frac{dP}{dt} = {{k_{t} \cdot {M(t)}} - {k_{p} \cdot {P(t)}}}$

where dM/dt and dP/dt are rates of change in mRNA and protein levels,respectively; M(t) and P(t) are moles of mRNA and protein at time t,respectively; k_(t) is the translation rate constant; and k_(m) andk_(p) are rate constants of mRNA and protein decay, respectively. Theanalytical solution for P(t) is proportional to:

${ {P(t)} \sim k_{t}}\frac{e^{- k_{p}t} - e^{- k_{m}t}}{k_{m} - k_{p}}$

where m₀ is the mass of mRNA present at t=0, and l is the mRNA length innucleotides. k_(p) is set to 0 since Nluc protein has negligibledegradation as measured by luciferase activity in transientlyNluc-expressing HEK293 cells for at least 6 hours after cycloheximidetreatment, which allows assessment of protein degradation in the absenceof further translation⁹⁹. k_(m) is the degradation constant obtainedfrom the linear fit of in-cell time course RNA data (−β₁ above). k_(t)is the ribosome load calculated by summing weighted RNA abundances frompolysome profile data.

Example 9: Luciferase Activity Assay after mRNA Transfection

Method: Media from transiently transfected HEK293T cells was aspiratedand cells were lysed in 40 μL of 1× passive lysis buffer from theDual-Luciferase Reporter Assay System (Promega, E1980) and eitherdirectly assayed or frozen at −20° C. After thawing, 20 μL ofsupernatant was transferred to a new plate and assayed for luciferaseactivity using the Nano-Glo Dual-Luciferase Reporter Assay System(Promega, N1610) to measure Firefly (Fluc) and NanoLuc (Nluc) luciferaseactivities. In particular, 50 μL of ONE-Glo Ex Reagent was added to eachwell of lysate and incubated for 3 minutes at room temperature beforemeasuring Fluc activities. Subsequently, 50 μL of NanoDLR Stop & Gloreagent was added to each well, and incubated for 10 min at roomtemperature before measuring luciferase activities on a GloMax-Multi(Promega) plate reader. Luciferase reporter activity is expressed as aratio between Nluc and Fluc. Each experiment was performed a minimum ofthree independent times. Because this assay relies on accumulation ofluciferase in the cytosol, any signal peptide sequences were removedfrom the CDS for templates and mRNA for these transfection andluciferase activity experiments.

Example 10: Polysome Selection Library Sequencing Data Analysis

Method: Following adapter trimming, 670440 sequences with at least 10summed read count across all libraries combined were set as thereference. Each library was aligned to this indexed reference usingBowtie2. Only uniquely mapping reads with edit distance ≤3 wereretained. Alignments were further deduplicated using UMicollapse (-p0.05, -k 1). This results in the matrix of read count where rows aredifferent sequence variants and columns are the samples.

Normalized counts were obtained by dividing the matrix column-wise bytotal read counts per sample. For sequence variants with at least 15reads in any one of the samples, a regression model was fitted onnormalized read counts with the sequential selection rounds as ordinalpredictors, penalizing differences between coefficients of adjacentgroups (R package ordPens). False discovery rate was estimated byBenjamini-Hochberg procedure. For choosing the final set of candidates,the criteria of 15 read counts in the final round polysome selectionlibrary and 2 fold enrichment over input in the final round was alsorequired.

DOCTRINE OF EQUIVALENTS

Having described several embodiments, it will be recognized by thoseskilled in the art that various modifications, alternativeconstructions, and equivalents may be used without departing from thespirit of the invention. Additionally, a number of well-known processesand elements have not been described in order to avoid unnecessarilyobscuring the present invention. Accordingly, the above descriptionshould not be taken as limiting the scope of the invention.

Those skilled in the art will appreciate that the foregoing examples anddescriptions of various preferred embodiments of the present inventionare merely illustrative of the invention as a whole, and that variationsin the components or steps of the present invention may be made withinthe spirit and scope of the invention. Accordingly, the presentinvention is not limited to the specific embodiments described herein,but, rather, is defined by the scope of the appended claims.

1-33. (canceled)
 34. A method for identifying RNA molecules possessingincreased translatability and stability, comprising: obtaining a pool ofRNA molecules, wherein each RNA molecule is uniquely encoded with abarcoding sequence and each barcoding sequence is flanked by at leastone profiling sequence; assessing translatability of the pool of RNAmolecules by: transfecting a cell or cell lysate with a first subset ofthe pool of RNA molecules; performing polysome profiling on the firstsubset of the pool of RNA molecules to segregate RNA molecules based onthe number of ribosomes bound to the RNA molecule; and isolating afraction from the polysome profile to generate a first set of RNAmolecules showing a first level of ribosomes bound to the RNA moleculesin the set of RNA molecules; and assessing stability of the pool of RNAmolecules by: treating a second subset of the pool of RNA moleculesunder an experimental condition; and isolating a fraction from thesecond subset the pool of RNA molecules at a specified timepoint togenerate a second set of RNA molecules showing stability under theexperimental condition for the specified timepoint.
 35. The method ofclaim 34, further comprising sequencing the barcode sequence of thefirst set of RNA molecules and the second set of RNA molecules toidentify the presence of each RNA molecule in each fraction of RNAmolecules.
 36. The method of claim 35, further comprising determiningtranslatability and stability of the RNA molecules associated with eachbarcode sequence in the first set of RNA molecules and the second set ofRNA molecules by identifying the prevalence of each barcode in eachfraction of RNA molecules.
 37. The method of claim 34, wherein thebarcoding sequence is selected from SEQ ID NOs: 115-1380.
 38. The methodof claim 34, wherein the profiling sequence is selected from SEQ ID NOs:1381-1382.
 39. A method to select for RNA elements, comprising:obtaining a library of RNA molecules, wherein each RNA moleculecomprises a coding sequence, a 5′ untranslated region (5′UTR), and a 3′untranslated region (3′UTR), wherein one of the coding sequence, the5′UTR, or the 3′UTR comprises a degenerate region; assessing a propertyof the library of RNA molecules, wherein the property is selected fromthe group consisting of translatability, in vivo stability, and in vitrostability; and selecting an RNA molecule from the library of RNAmolecules showing increase in the property over other RNA molecules inthe library of RNA molecules.
 40. The method of claim 39, furthercomprising sequencing the selected RNA molecule.
 41. The method of claim39, wherein the selected RNA molecule is a pool of RNA molecules. 42.The method of claim 41, further comprising: reassessing the property ofthe pool of RNA molecules; and selecting an RNA molecule from the poolof RNA molecules showing increase in the property over other RNAmolecules in the pool of RNA molecules.
 43. The method of claim 42,further comprising sequencing the selected RNA molecule from the pool ofRNA molecules.
 44. The method of claim 39, wherein the property istranslatability.
 45. The method of claim 39, wherein the degenerateregion is selected from the group consisting of: a deletion, a randomsequence, an ambiguous sequence, and a truncation.
 46. The method ofclaim 34, wherein the RNA molecules are transfected into a collection ofcells.
 47. The method of claim 46, wherein the collection of cells isselected from mammalian cells, yeast cells, bacteria cells, and plantcells.
 48. The method of claim 34, wherein the RNA molecules are addedto a cell lysate.
 49. The method of claim 34, wherein polysome profilingcomprises adding a cell lysate to a sucrose gradient and centrifugingthe sucrose gradient to segregate the RNA molecules.
 50. The method ofclaim 34, further comprising isolating a second fraction from thepolysome profile to generate a second set of RNA molecules showing asecond level of ribosomes bound to the RNA molecules in the set of RNAmolecules, wherein the first level and second level represent differentamounts of bound ribosomes.
 51. The method of claim 50, furthercomprising sequencing the barcode sequence of each RNA molecule in thefirst set of RNA molecules and the second set of RNA molecules toidentify the presence of each RNA molecule in the first set of RNAmolecules and the second set of RNA molecules.
 52. The method of claim34, wherein the treatment condition is selected from temperature, pH,presence of certain molecules, presence of certain ions, concentrationof certain molecules, concentration of certain ions, irradiation, buffertype, and buffer concentration.
 53. The method of claim 34, furthercomprising size selecting for full-length RNA molecules.